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Sammary 


This  paper  describes  a  compilv-based  approach  to  effident  cbeclqiointing  for  process  recovery.  Our 
in^ilementation  is  transparent  to  both  the  programmer  and  the  hardware.  The  compiler-genoated  sparse 
potential  checkpoint  code  maintains  the  desired  checkpoint  interval.  Adaptive  checkpointing  reduces  the  size 
of  the  checkpoints.  Training  is  used  to  select  low-cost,  high-coverage  potential  checkpoints.  The  problem 
of  selecting  potential  checlqioints  is  shown  to  be  NP-complete  and  a  heuristic  algorithm  is  introduced 
that  determines  a  quick  suboptimal  solution.  These  compiler-assisted  checkpointing  techniques  have  been 
implemented  in  a  modified  version  of  the  GNU  C  (GCC)  compiler.  Experiments  involving  the  CATCH 
GCC  compiler  on  a  Sun  SPARC  workstation  are  summarized. 
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1.  Introduction 

1.1.  Background 

Checlq)ointing  and  rollback  recovery  is  a  standard  technique  used  in  providing  a  fault-tolerant  computing 
environment  V^th  checkpointing,  if  a  fault  is  detected  by  the  computer  system,  the  application  is  then  rolled 
back  to  the  most  recent  checkpoint  where  the  state  of  the  system  is  known  to  be  correct.  Checkpointing 
schemes  can  be  broadly  divided  into  two  categories,  depending  on  how  the  checkpoint  information  is 
maintained.  One  scheme,  called /u//  checkpointing,  involves  saving  the  entire  state  space  of  the  application 
for  every  checlqioint  llie  other  scheme,  called  incremental  checiqtointing,  creates  major  checlqjoints 
which  save  the  entire  state  space  and  minor  checlqmints  which  save  the  difference  between  the  current  and 
previous  state  space.  This  paper  examines  the  application  of  compiler-based  techniques  for  implementing 
full  checkpointing. 

Researchers  have  made  use  of  compilers  to  help  fadlitate  fault-tolerant  schemes  in  a  variety  of  ways. 
Alewine  etcU.  [  1  ]  implemented  a  compiler-assisted  multiple-instructionietry  approach  which  could  tolerate 
a  wide  class  of  code  execution  failures.  The  compiler  generates  code  in  such  a  way  as  to  ftiiminatp, 
possible  on-padi  and  branch  hazards  that  might  result  when  instruction  retry  is  poformed.  Long,  Futdis,  and 
Abraham  [2]  created  a  compiler-assisted  approach  for  static  checkpoint  insertion  which  maintain.s  the  desired 
checkpoint  int^al  and  reproducible  checkpoint  locations.  Balasubramanian  and  Baneijee  [3]  utilize  the 
compiler  to  aid  in  synthesizing  algorithm-based  checking  techniques  for  general  applications.  Wilken  and 
Kong  [4]  developed  a  method  of  concurrent  checking  of  memory  accesses  by  means  of  a  signature  that  is 
embedded  into  different  types  of  data  structures. 

Research  pertaining  to  classical  checkpointing  and  rollback  recovery  is  heavily  documented.  However, 
there  is  little  published  work  on  utilizing  compilers  in  generating  checlqxrints.  Related  to  this  area  is  the 
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work  done  by  Chandy  and  Ramamoorthy  [S].  They  formulated  a  gn^h-theoretic  method  which  guides  a 
programmer  in  the  decision  of  where  to  insert  checkpoints  in  a  program.  The  program  is  broken  down  into 
distinct  tasks.  Based  on  factors  such  as  execution  time,  checkpoint  time,  and  failure  rates,  their  algorithm 
can  determine  optimal  locations,  between  tasks,  for  checkpoint  placement.  By  identifying  the  optimal 
checkpoint  locations,  the  maximum  checkpoint  time,  the  expected  checkpoint  time,  or  the  expected  run  time 
can  be  minimized-  A  similar  approach  was  pursued  by  Ibueg  and  Babaoglu  [6],  and  Upadhyaya  and  Saluja 
[7].  The  problem  with  this  s^roach,  as  pointed  out  by  Qtandy  and  Ramamoorthy  [51,  is  that  the  cost  of 
analysis  incurred  by  the  programmer  can  be  excessive. 

1,2.  The  CATCH  GCC  Compiler 

This  paper  details  three  different  compiler-assisted  techniques  for  implementing  fuU  checkpointing.  The 
checlqpointing  techniques  are  called  spmpiler-g^sisted  yschniques  for  cbfickpointing  (CATCH).  The  check¬ 
pointing  sdieme  is  transparent  to  the  programmer  and  requires  no  modifications  to  the  computer  architec¬ 
ture.  The  GNU  C  compiler  (GCC)  version  2.4.3  [8]  was  modified  in  order  to  implement  the  checkpointing 
schemes.  Hgure  1  shows  the  modifications  made  to  the  compiler.  Highlighted  areas  represent  additions 
to  the  original  compiler.  The  purpose  of  the  CATCH  filter  is  to  insert  code  into  the  user  program  with 
supporting  fimctions  defined  in  the  CATCH  libraries.  The  experiments  detailed  in  this  paper  were  run  on  a 
Sun  SPARC  workstation  (sparc2,  UNIX  SunOS  Release  4.1.3). 

2.  Checkpoint  Interval  Maintenance 

This  section  describes  how  the  compiler  inserts  sparse  potential  checkpoints  in  the  program  to  maintain 
the  desired  checkpoint  interval.  An  optimal  checkpoint  interval  can  be  determined  by  following  one  of  the 


documented  standard  analysis  methods  [9-17|. 

2.1.  Potential  Checkpoints 

The  fundamental  aspect  of  our  approach  is  to  have  the  compiler  insert  code  into  the  user  program  so  that 
the  resulting  executable  can  periodically  establish  checkpoints.  Polling  is  used  to  determine  el^sed  time, 
since  not  all  general  purpose  multi-user  systems  have  user  accessible  interrupt  timers.  Polling  also  keeps 
the  implementation  easier  to  port  to  other  computer  systems.  The  code  that  establishes  checkpoints  is  a 
subroutine  named  jcheckpoint{ ).  It  is  comparable  to  Ibylor  and  Wright’s  [18]  -Establish{ )  subroutine  and 
the  rfork{)  subroutine  desrxibed  by  Smith  and  loannidis  [19].  One  notable  difference  from  the  rfork{) 
sp{m)ach  is  that  Jcheckpoint{)  does  not  require  fork{)  to  create  a  child  process  before  establishing  a 
checlqmint 

An  inherent  difficulty  of  this  approach  is  finding  the  proper  places  in  the  code  to  take  checkpoints,  so 
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as  to  maintain  a  regular  checkpoint  interval.  For  example,  in  the  following  small  block  of  code  the  compiler 
may  not  be  able  to  ascertain  where  to  insert  the  jcheckpoint{ )  subroutine  due  to  the  fact  that  the  loop  upper 
bound  is  unknown  at  compile  time. 

scanf  ( *%d'' ,  &bound)  ; 

sum  =  0; 

for  (i  =  0;  i  <  bound;  i++)  { 
sum  +=  f ( i ) ; 

} 

printf ( •%d\n* ,  sum) ; 

If  the  compiler  inserts  the  jcheckpoint{)  subroutine  inside  the  loop,  then  the  time  required  to  establish  a 
checkpoint  might  be  prohibitive.  On  the  other  hand,  if  the  compilo’  places  the  u:heckpoint{)  call  after  the 
loop,  the  desired  checkpoint  interval  may  be  exceeded  if  the  loop  code  executes  for  a  long  period  of  time. 

The  CATCH  GCC  compiler  resolves  this  dilemma  by  locating  places  in  the  program  where  the  clock 
should  be  examined  in  order  to  decide  whether  a  checkpoint  needs  to  be  taken  or  not.  These  locations 
are  called  potential  checkpoints.  In  our  implementadon,  potential  cbec]qx)ints  are  at  the  beginning  of 
subroutines  and  at  the  first  line  inside  a  loop.  The  code  that  handles  the  potential  checlqwint  functionality  is 
a  subroutine  called  -potential().  The  jpotentialQ  subroutine  acts  somewhat  like  a  software  monitor  [20]. 
The  -potential{)  subroutine  p^orms  two  functions: 

1.  Maintaining  checlqwint  intervals:  keep  the  time  elapsed  between  established  checlqminis  as  close  to 
the  optimal  checkpoint  interval  (denoted  T)  as  possible. 

2.  Minimization  of  resources:  minimize  the  time  and/or  space  required  of  the  checkpointing  activities. 

Most  system  level  dieckpointing  implementations  address  only  the  first  point  and  maintain  a  consistent, 
rigid  checlq)oint  interval.  Yet,  as  will  be  shown  in  Section  3  and  Section  4,  by  relaxing  the  constraint  of  a 
rigid  checkpoint  interval,  a  significant  saving  of  system  resources  is  sometimes  possible. 

Every  encounter  of  a  potential  checkpoint  allows  the  user  program  to  take  a  checkpoint.  If  the  time 
since  the  last  established  checlqmint  exceeds  T,  then  a  checkpoint  is  established  at  that  encountered  potential 
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checkpoint.  Since  potential  checkpoint  locations  are  frequently  encountered  (on  the  order  of  hundreds  of 
microseconds)  and  T  is  usually  on  the  order  of  minutes,  the  checkpoint  interval  is  consistently  maintained. 

22.  Sparse  Potential  Checkpoints 

Even  though  by  placing  .potential()  at  the  beginning  of  subroutines  and  within  loops  helps  to  maintain 
a  rigid  checkpoint  interval,  nonetheless  there  is  a  high  performance  overhead  in  the  frequent  polling  that 
results,  lb  reduce  the  excessive  polling,  and  hence  reduce  the  performance  overhead,  a  technique  called 
j  sparse  potential  checkpoint  is  used.  Ihe  foUowing  two  line  code  segment  replaces  the  original  call  to 

.potentialQ: 

i 

I 

if  {--counter  <=  0) 

^potential ( ) ; 

where  counter  is  initially  set  to  a  reduction  factor  1.  As  aresult,  only  one  call  to  jpotential{),  per  I  potential 
checkpoint  encounters,  is  generated.  The  average  diec]qx)int  interval  is  stiU  maintained  close  to  T,  since 
the  time  between  sparse  potential  checkpoints,  even  with  I  on  the  order  of  thousands,  is  relatively  short. 

Thble  I  outlines  the  results  of  an  experiment  comparing  the  cost  of  the  original  and  sparse  potential 
checlqx)int  code  on  a  Sun  SPARC.  The  first  column  is  the  execution  time  of  the  UNIX  time()  library 
subroutine,  which  consumes  most  of  the  time  in  .potential{)  (second  column).  The  third  column  represents 
the  time  required  to  execute  the  sparse  potential  d^ckpoint  code  if  the  counter  has  not  yet  reached  zero. 
Otherwise,  the  sparse  potential  checkpoint  code  takes  37.19  ps  to  execute.  The  av^age  cost  of  the  sparse 
potential  checlqxjint  code  can  be  calculated  from  the  equation  0.677  +  22J2.  as  an  example,  when  I  is 
set  to  1000,  the  cost  of  the  sparse  potential  checkpoint  code  (fourth  column)  is  significantly  less  than  the 
original  potential  checkpoint  code  (second  column).  The  lower  cost  is  prominent  in  reducing  the  polling 


time  overiiead. 
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l^le  I:  Cost  comparison  of  the  original  and  sparse  potential  checkpoint  code  (/xs) 


tini«0 

Is  oo 

Is  1000 

36.51 

37.19 

0.677 

0.704 

2J.  Experimental  Results 


IWo  factors  account  for  the  run-time  overhead  of  a  program  compiled  by  the  CATCH  GCC  compiler.  The 
first  factor  is  the  time  required  to  perform  the  actual  checkpoint  saves.  The  second  factor  is  attributed  to  the 
polling  time,  called  the  potential  checlqwint  time.  The  storage  device  used  in  the  experiments  was  an  IPI 
disk  on  a  Sun  file  servo^  with  a  6  Mbytes/sec  access  time.  In  all,  ten  benchmark  programs  were  compiled 
with  the  CATCH  GCC  compilo’  and  subsequently  rm.  Specifically,  the  following  are  the  benchmarks  used, 
some  of  which  were  first  converted  from  FORTRAN  to  C  wifii  the  £2c  translator  developed  by  AT&T  Bell 
Laboratories  and  Bellcore. 


RKF:  uses  die  Runge-Kutta-Fehlbeig  method  [21]  in  solving  the  ordinary  diffnendal  equation  y'  =  x  +  y, 
y(0)  =  2.  The  computations  performed  are  floating  point  intensive. 

CONYLV:  finds  the  convolution  of  1024  signals  with  one  response  using  the  FFT  algorithm  [22].  Each 
signal  length  was  256  bytes  while  the  length  of  the  response  was  99  bytes.  The  entire  data  set  was 
greater  dian  1  Mbytes,  towevo;  the  memory-resident  data  set  occupied  only  a  few  Kbytes. 

LUDCMP:  is  an  LU  decomposition  algorithm  [21]  that  decomposes  100  randomly  generated  matrices 
with  size  uniformly  distributed  between  1  and  100.  A  characteristic  of  the  benchmark  is  that  memory 
blocks  are  allocated  before  a  new  matrix  is  read  in  and  then  ate  subsequently  deallocated  after  the 
result  is  written  out. 

ESPRESSO:  is  a  SPEC  benchmark  program  [23]  developed  at  the  University  of  California  at  Bericeley 
that  performs  boolean  fiinction  minimization.  It  is  an  integer  benchmark  with  many  short  loops  and 
recursive  functions. 

MDLJSP2:  obtains  the  solution  to  the  equation  of  motion  for  a  model  of 500  atoms  interacting  through  the 
idealized  Lenard- Jones  potential.  The  program  is  from  the  collection  of  SPEC  benchmark  programs. 

FPPPP:  is  a  quantum  chemistry  SPEC  benchmark  which  heavily  involves  double-precision  floating  point 
operations.  The  code  is  characterized  by  its  very  large  basic  blocks. 

ORA:  is  a  CPU  intensive  floating  point  SI%C  benchmark.  ORA  traces  rays  through  an  optical  system 
comprised  of  different  surfaces.  Only  a  small  amount  of  I/O  is  performed. 
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MATRIX300:  is  one  of  the  SPEC  benchmarks  that  performs  a  variety  of  matrix  multiplications,  including 
matrix  transposes  using  Unpack  routines  SGEMV,  SGEMM,  and  SAXPY,  on  matrices  on  the  order 
of  300. 

TOMCATV:  is  a  mesh  generation  program  and  is  also  one  of  the  SPEC  benchmarks.  The  program  involves 
very  little  I/O. 

NASA7:  is  one  of  the  SPEC  benchmarics  and  is  comprised  of  seven  kernels.  Each  kernel  generates  its  own 
input  data,  performs  the  kernel,  and  compares  the  result  for  correctness.  The  kernels  involve  a  variety 
of  computations  dealing  with  arrays  and  matrices. 

Tbble  n  shows  the  results  of  the  first  experimental  run  involving  two  versions  of  each  program.  The 
“original”  version  represents  the  program  compiled  by  the  unmodified  GCC  compiler.  The  “normal”  version 
represents  the  program  compiled  by  the  CATCH  GCC  compiler.  The  parameters  for  the  “normal”  version 
are  as  follows.  Each  benchmark  has  I  =  1000.  Fbr  the  shorter  benchmarks  like  RKF,  CONVLV,  LUDCMP, 
and  ESraESSO,  the  value  of  T  was  set  to  10  seconds.  MDLJSP2  had  T  =  120  seconds,  both  FPPPP 
and  ORA  had  T  =  150  seconds,  MATRIX300  had  T  =  180  seconds,  TOMCATV  had  T  =  90  seconds, 
and  lastly  NASA?  had  T  =  900  seconds.  Tb  determine  how  much  time  can  be  attributed  to  polling  in 
the  “normal”  version,  another  version  of  the  program  (not  shown  in  the  table)  was  compiled  with  T  =  oo 
(actually,  a  very  large  number).  The  consequence  is  that  this  version  will  not  establish  any  dieckpoints, 
hence  the  longer  run  time  can  be  solely  accounted  for  by  the  potential  checlqwints. 

The  checkpoint  time  is  expressed  as  the  product  of  two  numbers.  The  first  number  is  the  average  time 
used  in  establishing  a  single  checkpoint  while  the  second  number  represents  the  total  number  of  checkpoints 
taken  during  one  nm  averaged  over  multiple  test  runs.  The  total  time  overhead  merit  is  expressed  as  a 
percentage  of  the  execution  time  of  the  “original”  version.  For  most  of  the  benchmarks,  the  oveihead  is 
near  or  below  the  10%  mark.  The  time  overhead  for  some  of  the  benchmarks,  though,  is  not  satisfactory 
and  the  methods  described  in  the  following  sections  must  be  employed  to  make  the  overhead  tolerable. 
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Table  II:  Performance  of  the  “original”  and  “normal”  versions 


IMOgram 

vctskm 

total 

CXCCUtiOB 

date 

(secs) 

average 

checkpolBt 

size 

(bytes) 

potential 

checkpoint 

ttnie 

(secs) 

checkpoint 

time 

(secs*#) 

total 

time 

overhead 

(%) 

RKF 

original 

21.0 

- 

- 

- 

- 

normal 

22.8 

19424.0 

1.00 

0.40*2 

8.57 

CONVLV 

original 

17.8 

- 

- 

- 

- 

normal 

20.6 

49729.6 

12Q 

0.30*2 

15.73 

LUDCMP 

original 

24.0 

- 

- 

- 

- 

normal 

32.0 

108840.3 

7.00 

0.33*3 

33.33 

EsraEsso 

original 

27.0 

- 

- 

- 

- 

normal 

48.2 

331955.2 

1920 

0.48*4.2 

78.52 

MDUSP2 

original 

530.8 

- 

- 

- 

- 

normal 

609.4 

211653.9 

77.80 

0.16*5 

14.81 

FPPPP 

original 

645.6 

- 

- 

- 

- 

normal 

658.4 

359358.4 

12.00 

0.20*4 

1.98 

ORA 

original 

727.0 

- 

- 

- 

- 

normal 

733.6 

55068.0 

5.20 

0.35*4 

0.91 

MATRDOOO 

original 

1053.0 

- 

- 

- 

- 

normal 

1173.4 

2189567.7 

113.40 

1.17*6 

11.43 

TOMCATV 

original 

377.6 

- 

- 

- 

- 

normal 

405.2 

3730204.0 

16.00 

2.90*4 

7.31 

NASA? 

original 

3662.0 

- 

- 

- 

- 

normal 

3879.0 

2924748.0 

212.80 

1.05*4 

5.93 

3.  Checkpoint  Size  Reduction 

3.1.  Data  Ccmipression 

One  metbod  of  reducing  cbeckpoint  size  is  to  perform  data  compression  on  the  checlqjoint  files  before  they 
are  written  out.  We  implemented  an  LZW  data  compression  [24],  which  provided  significant  checlq)oint 
size  reduction  that  averaged  48.9%  over  all  checkpoints  for  all  the  benchmark  programs.  However,  the 
compression  algorithm  is  computationally  intensive.  As  a  resxilt,  we  developed  alternative  run-time  methods 


for  enhancing  checkpoint  performance. 
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Figure  2:  Variation  in  checkpoint  size  for  the  “small’’  version  of  LUDCMP 

32,  Adaptive  Checkpointing 

The  motivatioii  for  an  ads^tive  dieckpointing  sct^me,  which  allows  small  deviations  in  T,  is  illustrated 
in  Rgure  2.  Displayed  is  the  checkpoint  size  trace,  over  time,  of  a  “small”  version  of  LUDCMP.  which 
exhibits  large  variations  in  memory  usage.  As  an  example,  suppose  three  checkpoints  are  taken  at  time  U , 
U,  and  ts  with  corresponding  sizes  s*,  ss,  and  S2.  If  ti^  first  checkpoint  could  be  delayed  to  tz,  the  second 
shifted  back  to  tj,  and  the  third  delayed  until  te,  then  the  resulting  checkpoints  would  be  reduced  to  size  s^, 
si,  and  S]  respectively.  Adaptive  checkpointing  is  the  method  of  allowing  a  window  of  opportunity  to  take 
a  checkpoint  in  order  to  take  advantage  of  variations  in  memory  use. 

The  adaptive  checkpointing  scheme  uses  a  decision  algorithm  embedded  in  .potential().  The  algo¬ 
rithm  can  be  viewed  pictorially  as  a  transition  graph,  as  seen  in  Figure  3.  The  transition  graph  is  constructed 
dynamically  during  program  execution.  When  a  potential  checkpoint  is  first  encountered,  a  new  node  in 
the  graph  is  created;  otherwise  the  corresponding  node  is  located.  Associated  with  each  node  is  a  cost  (the 


 J 


Current  Cost  =  100,000 
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Figure  3:  Adaptive  checkpointing  transition  graph 


number  in  the  box  below  each  node  in  Figure  3),  which  in  our  implementation  is  the  average  size  of  the 
checlqxiint  that  would  be  established  at  that  particular  potential  checlqxiint.  Each  edge  has  a  transition 
probability,  which  represents  the  probability  of  encountering  next  the  potential  checkpoint  attached  to  that 
edge.  Both  the  cost  and  transition  probability  are  updated  during  a  potential  checkpoint  encounter. 

There  are  two  factors  that  guide  the  decision  of  when  to  take  a  checkpoint:  time  (T,  w,  and  the  current 
time)  and  cost  (the  current  cost  and  the  estimated  ne  cost).  The  variable  w  is  the  checkpointing  window 
size,  which  delimits  the  range  of  time  where  a  checkpoint  can  be  taken.  The  estimated  next  cost  is  simply 
the  weighted  average  of  the  costs  of  all  the  immediate  successor  nodes  of  the  current  node.  The  guideline 
for  establishing  a  checkpoint  is  stated  as: 


if  (the  time  since  the  last  established  checkpoint  >=  T) 
_checkpoint ( ) ; 

else  if  ( (T  -  w  <=  time  since  the  last  established  checkpoint) 
and  (the  estimated  next  cost  >  the  current  cost)) 
_checkpoint ( ) ; 
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else 

wait  until  the  next  encounter  of  a  potential  checkpoint 
to  see  if  a  checkpoint  should  be  established; 

lb  illustrate  the  functionality  of  the  transition  graph  in  Figure  3,  consider  an  example  where  the  current 
potential  checkpoint  is  node  1  and  the  current  cost  is  100,000.  If  the  elapsed  time  since  the  last  checkpoint 
falls  in  the  window  (i.e.,  between  T-w  and  T),  the  decision  to  take  a  checkpoint  will  be  postponed  because 
the  estimated  next  cost  (96,000)  is  lower  than  the  current  cost.  If,  on  the  other  hand,  the  elapsed  time  was 
equal  to  or  greater  than  T,  then  the  algorithm  would  immediately  force  a  checkpoint  to  be  taken. 

3  J.  Experimental  Results 

Hiible  in  summarizes  the  results  of  the  "adaptive”  version.  For  the  experiments,  w  =  1  second  for 
RKF,  CONVLV,  LUDCMP,  and  ESPRESSO  and  w  =  6  seconds  for  the  remaining  benchmarks.  In  our 
experiments,  values  of  w  less  than  10%  of  T  maintained  a  consistent  checkpoint  interval.  The  “adaptive" 
version  produced  smaller  checkpoints  for  only  LUDCMP.  The  other  benchmarks  are  written  in  such  a  way 
that  memory  use  is  relatively  constant  As  a  result,  the  "adaptive”  version  will  not  encounter  locations 
where  memory  use  is  markedly  lower  and  hence  be  able  to  establish  a  smaller  checkpoint.  The  increase  in 
checlq)oint  size  is  due  to  the  transition  graph  information  that  must  also  be  saved. 

4.  Potential  Checkpoint  Time  Reduction 

This  section  outlines  the  training  method  that  reduces  the  polling  time  of  the  potential  checkpoints.  The 
training  method  is  inspired  by  Chang  and  Hwu’s  profiling  technique  [26].  A  new  subroutine  called 
^napshot{ ),  which  maintains  statistical  information  on  the  checkpoint  time  and  potential  checkpoint  time, 
replaces  the  sparse  potential  checlq)oint  code.  The  execution  time  overhead  for  the  .snapshot()  subroutine 
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Table  III:  Performance  of  the  “adaptive”  version 


program 

total 

execution 

time 

(secs) 

average 

checkpoint 

size 

(bytes) 

potential 

checkpoint 

time 

(secs) 

checkpoint 

time 

(secs*#) 

total 

time 

overhead 

(%) 

RKF 

22.8 

19468.0 

1.00 

0.40*2 

8J7 

CONVLV 

19.6 

50052.8 

1.20 

0.30*2 

10.11 

LUDCMP 

31.8 

64551.2 

120 

020*3 

32.50 

ESntESSO 

60.0 

332252.8 

30.12 

0.48*6 

12222 

MDUSP2 

631.8 

213554.7 

100.00 

020*5 

19.03 

FPPPP 

676.6 

361252.8 

30.00 

025*4 

4.80 

ORA 

740.2 

55324.0 

11.80 

0.35*4 

1.82 

MATRDGOO 

1188.0 

2189956.0 

127.75 

1.17*62 

12.82 

TOMCATV 

405.6 

3730716.0 

16.40 

2.90*4 

7.42 

NASA? 

3916.4 

2928744.0 

250.20 

1.05*4 

6.95 

is  30.87  /iS  per  encounter.  Tbe  training  technique  is  comprised  of  three  steps: 


1.  Compile  a  “training”  Version. 

2.  Run  tbe  “training”  version  to  acquire  a  profile  of  the  checkpoint  size  over  time. 

3.  Compile  the  “trained”  version  of  the  program,  utilizing  the  information  gathered  in  step  2. 

4.1.  Sampling 

Tbe  purpose  of  the  ^napshot{)  subroutine  is  to  maintain  for  every  potential  checlqx)int  two  statistics.  It 
records  the  number  of  times  a  potential  checkpoint  is  encountered  in  addition  to  the  average  checkpoint 
size,  as  if  checkpoints  were  established  at  this  potential  cbeclq)oint  location.  The  information  is  written  out 
to  an  accounting  file  after  every  I  encounters  of  potential  checkpoints.  Also,  the  “training”  vosion  records 
in  the  accounting  file  the  total  execution  time  of  the  program. 

The  sampling  phase  (step  2)  is  pictoiially  represented  in  Figure  4,  which  shows  a  trace  of  the  checlqioint 
size  for  the  “small”  version  of  the  LUDCMP  benchmark.  The  trace  is  subdivided  into  ten  time  intervals 
labeled  ai  through  aio,  each  containing  18300  encounters  of  potential  checkpoints  (i.e.,  I  =  18300).  The 
labels  S4  and  5i4  represent  the  fourth  and  fourteenth  potential  checkpoints  (the  remaining  labels  are  omitted 
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Figure  4:  Sampling  the  checkpoint  size  by  potential  checkpoints 

for  clarity).  The  lines  emitting  from  54  and  5i4  show  the  locations  where  the  two  respective  potential 
checkpoints  are  encountered.  As  an  example,  if  a  checlq)oint  should  be  established  at  ^4  or  Su,  it  would 
be  advantageous  to  select  5^  since  the  average  checkpoint  size  is  less  than  that  of  54. 

4Jt.  Analysis 

The  final  phase  of  the  training  method  is  the  analysis  phase  (step  3),  where  the  compiler  processes  the 
accounting  information  provided  by  the  sampling  phase.  There  are  two  fimctions  to  be  performed  in  this 
phase:  1)  assign  a  cost  to  every  potential  checkpoint,  and  2)  determine  a  minimum  cost  subset  of  all  potential 
checlq)oints  that  can  fully  cover  the  time  intervals.  The  process  of  finding  a  minimum  cost  subset  is  aided  by 
the  use  of  a  bipartite  graph  as  illustrated  in  Figure  5.  The  nodes  on  the  left-hand  side  represent  the  potential 
checkpoints  Si,  I  <  i  <  m.  The  nodes  on  the  right-hand  side  represent  the  time  intervals  aj,  1  <  J  <  n. 
A  cost  pair  {Pi,  Ci)  is  calculated  and  assigned  to  each  Si,  where  Pi  is  the  polling  cost  and  Ci  is  the  cost  of 
establishing  a  checkpoint  for  potential  checkpoint  i.  A  line  joins  an  5i  to  an  aj  if  that  Si  is  encountered 


(100,200)  (  5i 


(150,140)  t 


(300,160)  (  53 


(120.300) 


Figure  5:  Weighted  Minimum  Cover  (WMC)  problem 


during  that  time  interval.  The  cost  of  a  subset  X  =  {5j*|l  <k<  m!)  is  defined  as; 


m'  p.  n 

co,t(x) = 5:  fs.  + 

where  4  €  {1, . . . ,  m}.  The  first  term  of  the  cost  fimction  is  the  total  polling  time,  contributed  by  the 
individual  potential  checkpoints.  The  second  term  is  based  on  the  premise  that  the  more  frequently  a 
potential  checkpoint  is  encountered,  the  greater  the  possibility  that  a  checlqwint  wiU  be  established  there. 
Hence,  the  total  checkpoint  time  (second  term)  is  a  weighted  average  of  the  checkpoint  time  of  each  potential 
checkpoint 

For  a  particular  checlqmint  t,  the  value  of  P,  can  be  calculated  as  the  product  of  the  time  overhead 
of  the  sparse  potential  checkpoint  code  and  the  number  of  encounters.  The  value  of  C,  is  calculated  as  the 
product  of  the  dieckpoint  size  and  the  number  of  checkpoints  established  during  program  execution.  The 
number  of  checkpoints  established  is  determined  by  dividing  the  net  execution  time  (total  execution  Hnift 
minus  the  time  spend  in  ^napshot{))  by  T.  The  time  overhead  value  of  the  sparse  potential  checkpoint 
code  needed  in  the  Pi  calculation  is  obtained  from  Thble  I. 
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4  J.  Wdghted  Minimum  Cover  (WMC)  Problem 

After  the  costs  are  calculated,  the  hnal  task  of  finding  a  minimum  cost  subset  of  potential  checkpoints  that 
covers  all  the  time  intervals  can  be  addressed.  This  task  is  called  the  Weighted  Minimum  Cover  (WMC) 
problem.  As  an  example,  the  minimum  cost  subset  in  Hgure  5  is  comprised  of  the  two  shaded  potential 
checkpoints. 

Before  the  WMC  problem  is  formally  stated,  certain  variables  need  to  be  defined.  Consider  each  5, 
a  subset  of  A  =  {aj|l  <  j  <  n}.  Let  5  =  {(5i,  Pi,Ci)15i  C  A,  Pi  and  Ci  are  positive  real  numbers, 
1  <  1  <  m},  /  a  polynomial  time  cost  function  with  bounded  value  ranging  from  2*^  to  non-negative  real 
numbers,  and  if  is  a  positive  real  number.  A  set  ^  C  5  is  considered  a  cover  of  A  if  {u^,  =  A|  for  all  5, 
in  if}. 

The  WMC  decision  problem  can  be  formally  stated  as  follows:  Given  a  4-tuple  {A,  S,  f,  K),  does 
there  exist  a  subset  X  C  S  such  that  is  a  cover  of  A  and  f{X)  <  Kl  There  might  be  more  than  one 
unique  subset  that  satisfies  the  previously  stated  problem  constraints.  With  this  fact  in  mind,  a  corresponding 
WMC  optimization  problem  can  be  stated  as  follows:  Given  a  3-tuple  (A,  5,  /),  find  a  subset  X  C  S  such 
that  is  a  cover  of  A,  and  f{X)  <  f{Y)  for  ev«y  Y  that  is  a  cover  of  A.  By  proving  that  the  WMC 
decision  problem  is  NP-complete,  it  is  straightforward  to  show  that  the  WMC  optimization  problem  is 
NP-hard. 

Property;  The  WMC  decision  problem  is  NP-complete. 

Proof;  To  prove  that  the  WMC  decision  problem  is  NP-complete,  it  is  adequate  enough  to  show  that 
it  is  in  NP-space  and  is  NP-hard.  For  a  given  X,  the  process  of  verifying  X  is  a  cover  of  A,  calculating 
f(X},  and  subsequently  comparing  the  result  to  K  can  all  be  accomplished  in  polynomial  time.  Thus,  the 
WMC  decision  problem  is  in  NP.  If  the  function  /  is  confined  to  positive  solutions,  i.e.  to  f{X)  =  (X|, 
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then  the  WMC  decision  problem  becomes  the  Minimum  Cover  decision  problem  which  is  known  to  be 
NP-complete  [27],  Therefore,  the  WMC  decision  problem  is  also  NP-hard. 


4.4.  A  Heuristic  Algorithm 

The  number  of  potential  cl^kpoints  in  a  program  can  range  from  tens  to  hundreds  and  the  number  of  time 
intervals  is  usually  of  the  order  of  thousands.  As  a  result,  the  process  of  finding  an  optimal  solution  to 
the  WMC  problem  can  be  computationally  expensive.  However,  obtaining  an  optimal  solution  is  neither 
critical  nor  necessary  because  the  costs  are  based  on  estimates,  and  secondly,  in  our  experiments  suboptimal 
solutions  perform  well. 

The  heuristic  algorithm  detailed  in  this  section  is  designed  with  two  factors  in  mind:  1)  some  potential 
checkpoints  are  infrequently  encountered  (e.g.,  those  at  the  begiiming,  the  end,  and  in  blocks  of  code 
dealing  with  error  cases),  and  2)  by  not  covering  all  the  time  intervals,  performance  may  be  only  slightly 
impacted.  The  algorithm  processes  and  selects  potential  checkpoints  based  on  a  coverage  factor  lower 
bound  (LOWMARK)  and  iqrper  bound  (HIGHMARK).  The  coverage  factor  of  a  subset  X  C  5  is  expressed 
as  the  number  of  time  intervals  covered  by  the  union  of  the  first  components  of  its  elements  divided  by  the 
total  numb«  of  time  intervals.  For  example,  in  Hgure  S,  the  subset  X  =  {52, 54}  has  a  coverage  factor  of 
1.0  sin<'e  all  the  time  intervals  (ai  through  as)  are  covered.  The  experiments  performed  used  a  value  of  0.5 
for  LOWMARK  and  0.9  for  HIGHMARK.  The  algorithm  is  described  as  follows: 


delete  all  potential  checkpoints  with  coverage  factor 
less  than  LOV4MARK; 

use  a  sorting  algorithm  to  order  potential  checkpoints  by 
the  cost  Pi+Ci; 

X  =  next  candidate  solution; 
while  (X  is  not  en^jty)  { 

if  (coverage  factor  of  X  >=  HIGHMARK) 
break; 

else 


X  =  next  candidate  solution; 
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} 

if  (X  is  empty) 

X  =  S; 

record  X  pennrainently; 

The  first  part  of  the  algorithm  eliminates  ail  the  potential  checkpoints  that  are  infrequently  encountered. 
The  remaining  potential  checkpoints  are  considered  high  coverage  (greater  than  LOWMARK).  Hence,  the 
task  of  finding  a  subset  of  potential  checlqwints  wifii  a  combined  coverage  factor  greater  than  HIGHMARK 
will  be  simplified,  though,  the  algorithm  could  still  take  exponential  time  in  the  worst  case.  The  candidate 
solution  may  be  generated  in  either  a  depth-first  or  breadth-first  manner. 

4,5.  Experimoital  Results 

There  are  two  aspects  to  consider  when  evaluating  the  performance  of  the  training  method.  The  first  aspect 
is  the  amount  of  system  resources  consumed  by  the  compilation  and  execution  of  the  “training”  vosion. 
The  second  aspect  to  evaluate  is  die  performance  of  the  actual  “trained”  version.  Thble  IV  summarizes 
the  results  of  the  first  aspect.  The  training  time  refers  to  the  total  execution  time  of  the  “training”  version 
and  the  analysis  time  is  the  time  to  run  the  heuristic  algorithm,  including  the  I/O  time  to  read  the  sampling 
data  from  the  accounting  file.  The  cover  set  shows  the  number  of  potential  checkpoints  retained  by  die 
heurisdc  algorithm.  The  coverage  provided  by  the  cover  set  is  represented  as  a  percentage.  For  large  and/or 
long  tunning  programs  (ESPRESb  ^  MDLJSP2,  MATRIX3(X),  TOMCATV,  NASA?)  the  training  time  is 
limited  by  a  compiler  opdon.  The  reason  for  the  time  restriction  is  because  the  accounting  file  can  grow 
dramatically  large  in  addition  to  the  training  time  becoming  excessively  long. 

The  results  of  the  “trained”  version,  along  with  the  other  versions,  are  shown  in  Thble  V.  The  input  data 
used  for  the  “trained”  version  experiments  differed  fiom  the  input  data  used  during  the  training  step.  For  all 
the  benchmarks,  except  MDLJSP2,  the  polling  time  attributed  to  the  potential  checkpoints  is  significandy 
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Tiible  IV:  Performance  of  the  “training”  version 


program 

«or 

potential 

checkpoints 

#or 

selected 

Intervals 

training 

time 

(secs) 

file 

size 

(bytes) 

analysis 

time 

(msecs) 

cover 

set 

coverage 

(%) 

RKF 

5 

2,577 

93 

113400 

233 

}) 

100 

CONVLV 

20 

4,317 

137 

525112 

1008 

6} 

94.8 

LUDCMP 

18 

10,502 

315 

979092 

1916 

{7.9} 

93.0 

ESPRESSO 

1055 

1,365 

60 

9614400 

16269 

{50,810,918} 

100 

MDUSP2 

83 

7,342 

360 

2639028 

4875 

{81} 

100 

FPPPP 

?5 

2,825 

336 

1633720 

2983 

{4} 

97.7 

ORA 

9 

3,648 

957 

277260 

525 

{6} 

99.1 

MAIKDGOO 

24 

5336 

180 

946812 

1691 

{20} 

98.2 

TOMCATV 

19 

2,057 

90 

227384 

10349 

{6,8,12,15,17} 

95.3 

NASA? 

149 

19,437 

900 

2705032 

4883 

97.0 

reduced  and  in  some  cases  completely  negligible.  The  reason  MDLJSP2  did  not  p^orm  better  is  that  the 
number  of  potential  checkpoints  was  only  reduced  from  83  to  81.  The  principal  feature  of  MDLJSP2  that 
caused  the  “trained”  version  to  perform  badly  is  the  lack  of  a  small  set  of  dominant  potential  checkpoints. 
Most  of  the  potential  checlqmints  were  retained  to  help  cover  the  time  intorvals. 

A  coirelation  exists  between  ceitain  code  dtaractmstics  and  the  performance  of  the  CATCH  com¬ 
piler  in  training  mode.  Code  with  a  significant  amount  of  recursion  usually  results  in  little  p^ormance 
improvement  The  potential  checkpoint  at  the  begiiuiing  of  the  recursive  subroutine  can  be  encountoed 
quite  often,  thus  heavily  increasing  the  polling  time.  Conversely,  code  comprised  of  loops  with  many  iVa- 
ations  and  large  loop  body  size  can  p^orm  well  with  the  CATCH  com^  ler.  In  theses  cases,  the  potential 
checlqwint  execution  time  will  not  dominate  the  execution  time  of  the  loop  iteration.  Even  with  nested 
loops  the  performance  can  be  satisfactory,  since  the  training  technique  can  usually  remove  the  majority  of 
the  potential  checkpoints  located  at  difierent  nested  levels  to  reduce  the  polling.  Bendimarks  that  possessed 
these  favorable  code  diaracteristics  and  performed  well  under  training  were  FPPPP,  ORA,  MATRIX3(X), 


TOMCATV,  and  NASA?. 
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'Rible  V;  Perfonnance  for  all  versions 


prognuB 

VCfSkM 

total 

cxaeatkM 

ttaM 

(a«) 

average 

chedqpoiat 

siae 

(bytes) 

potential 

cbeckpoint 

tfane 

(»«s) 

checkpoint 

time 

(secs*#) 

total 

tine 

overhead 

(%) 

RKF 

original 

21.0 

- 

- 

- 

- 

normal 

22.8 

19424.0 

1.00 

0.40*2 

8.57 

adaptive 

22.8 

19468.0 

1.00 

0.40*2 

8.57 

trained 

22.0 

19340.0 

0.00 

0.50*2 

4.76 

CONVLV 

original 

17.8 

- 

- 

• 

- 

normal 

20.6 

49729.6 

220 

0.30*2 

15.73 

adaptive 

19.6 

50052.8 

120 

0.30*2 

10.11 

trained 

192 

49780.0 

0.80 

030*2 

7.87 

LUDCMP 

original 

24.0 

- 

- 

- 

- 

normal 

32.0 

1088403 

7.00 

0.33*3 

33.33 

adaptive 

31.8 

645512 

720 

020*3 

32.50 

tndned 

27.4 

131320.8 

3.00 

020*2 

14.17 

ESPRESSO 

original 

27.0 

- 

- 

- 

- 

normal 

48.2 

3319552 

19.20 

0.48*42 

78.52 

ad^tive 

60.0 

332252.8 

30.12 

0.48*6 

12222 

trained 

32.6 

348672.0 

4.40 

0.60*2 

20.74 

MDUSP2 

original 

S30.8 

- 

- 

- 

- 

normal 

609.4 

211653.9 

77.80 

0.16*5 

14.81 

adaptive 

631.8 

213554.7 

100.00 

020*5 

19.03 

trained 

612.4 

211652.6 

80.80 

0.16*5 

15.37 

PPPPP 

original 

645.6 

- 

- 

- 

- 

normal 

658.4 

359358.4 

12.00 

020*4 

1.98 

adi^ve 

676.6 

361252.8 

30.00 

025*4 

4.80 

tndned 

6472 

359492.0 

1.00 

020*3 

0.25 

ORA 

original 

727.0 

- 

- 

- 

- 

normal 

733.6 

55068.0 

520 

0.35*4 

0.91 

adaptive 

740.2 

55324.0 

11.80 

0.35*4 

1.82 

trained 

728.8 

54844.0 

0.40 

0.35*4 

0.25 

MATR1X300 

original 

1053.0 

- 

- 

- 

- 

normal 

1173.4 

2189567.7 

113.40 

1.17*6 

11.43 

adaptive 

1188.0 

2189956.0 

127.75 

1.17*6.2 

12.82 

trained 

1058.8 

2189572.0 

0.00 

1.16*5 

035 

TC»1CATV 

original 

377.6 

- 

- 

- 

- 

normal 

4052 

3730204.0 

16.00 

2.90*4 

7.31 

adi^tive 

405.6 

3730716.0 

16.40 

2.90*4 

7.42 

trained 

386.0 

3730204.0 

0.40 

2.00*4 

2.22 

NASA? 

original 

3662.0 

- 

- 

- 

- 

normal 

3879.0 

2924748.0 

212.80 

1.05*4 

5.93 

adaptive 

3916.4 

2928744.0 

250.20 

1.05*4 

6.95 

trained 

3699.6 

2924788.0 

30.60 

1.75*4 

1.03 
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5.  Imptementatkm  of  the  CATCH  GCC  Compiler 

5.1.  CATCH  Filter  aod  CATCH  Libraries 

The  CATCH  filter  (Figure  1),  placed  between  the  parsing  and  optimization  stages,  works  on  an  intermediate 
language  representation  called  the  Register  lYanster  Level  (RTL).  The  CATCH  filter  executes  the  following; 

1 .  Inserts  the  subroutine  call  jprolog{ )  at  the  start  of  the  matn( )  fimction. 

2.  Inserts  the  sutnoutine  call  jepilog( )  at  the  end  of  the  main{ )  fimction. 

3.  Locates  loops  and  subroutines  to  insert  sparse  potential  checlqmint  code  as  outlined  in  Section  2  and 
Section  4  when  training  is  performed. 

4.  Replaces  traditional  library  calls  to  fopen(),  fclose{),  exit(),  and  abort{)  with  _open{),  .close{), 
JtiiQ,  and  JbortQ  respectively. 

5.  Inserts  variables  that  are  used  by  the  checkpointing  scheme. 

6.  ContrDlsthetrainiagmode(Section4). 

The  CATCH  library  contains  the  corresponding  definitions  for  the  subroutines  ^checkpoints), 
.prologS),  jepilogS),  .potentials),  .snapshots),  -open(),  jcloseS),  ~xitS),  and  jbortS).  In  addition,  an 
altered  version  of  mallocS )  and  the  data  compression/decompression  code  is  included.  The  implementation 
details  of  the  subroutines  ate  described  in  the  following  subsections. 

5.2.  Chcckptrinting 

The  jcheckpointS )  subroutine  saves  the  register  environment  and  records  the  current  position  of  each  open 
file.  A  reentry  address  located  towards  the  end  of  the  jcheckpointS )  subroutine  is  then  stored.  The  reentry 
address  is  the  location  were  execution  will  resume  at  recovery  time.  The  start  and  end  address  of  the  dataseg, 
whidi  contains  both  the  initialized  data  segment  and  the  uninitialized  bss  segment,  and  the  spseg,  which 
is  just  the  stack  segment,  ate  both  determined.  Finally,  the  layout  of  memory  usage  (i.e.,  the  start  and  end 
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Checkpoint  File  n  -  1 
Checkpoint  File  n 
New  Checkpoint  Hie 

Figure  6:  Checkpoint  files  and  index  files 

addresses  of  dataseg  and  spseg)  is  saved  in  a  checkpoint  file  along  with  the  actual  contents  of  dataseg  and 
spseg  (Hgure  6). 

A  difficult  issue  in  checkpointing  is  how  to  handle  the  state  space  associated  with  I/O  states  and  disk 
contents.  In  practice,  many  user-level  dieckpointing  methods  will  either  not  permit  I/O  operations  [19]  or 
only  opoate  with  a  specially  designed  I/O  subs^tem  [18].  Fratunately,  for  a  majority  of  non-interactive 
applications,  simple  stream  file  I/O  is  only  needed.  With  our  CATCH  implementation,  sequential  reads  and 
writes  of  stdin,  stdout,  stderr  m  disk  files  using  the  stdio  package  (furnished  in  a  UNIX  system)  are  allowed. 

In  order  to  tolerate  system  crashes,  a  minimum  of  two  checkpoint  files  (one  working  file  plus  one  or 
more  established  files)  need  to  be  maintained.  Eadi  checkpoint  file  has  a  unique  name,  6ose.ckpn,  where 
base  is  the  base  name  of  the  file  containing  the  main()  function  and  n  is  the  sequence  number  of  the 
checkpoint  file.  A  cbeclqwint  file  is  considered  valid  only  if  all  information  has  been  recorded  in  the  file 
and  it  has  been  successfully  closed. 

The  most  recent  checkpoint  file  is  kept  track  of  by  an  index.  A  setckp  utility  is  provided  that  allows 
the  user  or  an  automatic  restart  mechanism  to  modify  the  index  for  cases  involving  latent  error  detection. 
The  index  is  recorded  in  two  identical  index  files  (Hgure  6),  named  base.actO  and  6ase.actl.  Two  indexes 
are  maintained  so  that  a  system  crash  during  the  update  of  the  index  would  not  destroy  a  record  of  the  index 
(see  the  following  subsection).  The  steps  involved  in  oeating  a  new  checlq}oint  are  as  follows: 

1.  Create  the  nth  valid  checkpoint  file  Fn. 
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2.  Update  the  index  value  in  the  first  index  file  Iq  from  n  -  1  to  n. 

3.  Update  the  index  value  in  the  second  index  file  /]  from  n  -  1  ton. 

The  checkpoint  is  considered  established  when  the  content  of  lo  is  set  to  n  in  step  2. 

S3.  Recovery 

For  a  process  to  recover  from  an  error,  the  process  is  simply  restarted  by  reissuing  the  original  command. 
The  responsibility  of  reissuing  can  be  handled  by  the  uso'or  by  an  automatic  restart  mechanism  that  monitors 
all  submitted  but  unfinished  jobs.  Upon  reissuing  the  original  command,  the  process  will  immediately  enter 
the  jprologO  subroutine.  Subsequently,  the  index  value  stored  in  Iq  (denoted  io)  and  the  index  value  stored 
in  /]  (denoted  tj)  are  read  to  determine  which  checkpoint  file  to  use.  Index  file  Iq  (or  I\ )  is  valid  if  it  exists 

as  well  as  the  checkpoint  file  that  is  points  to,  (or  Fi^),  also  exists.  The  actions  taken  during  recovery 

are  guided  by  the  list  of  rules: 

1 .  If  neither  Iq  nor  /j  is  valid,  start  execution  fi:om  the  begitming. 

2.  If  Iq  is  valid  but  h  is  invalid,  use  Fi^  for  recovery. 

3.  If  Iq  is  invalid  but  I\  is  valid,  use  Fi^  for  recovery. 

4.  If  both  Iq  and  I\  are  valid  and  to  =  use  Fig  for  recovery. 

5.  Ifboth/o  and /]  are  valid  but  to  >  ti,usefio  for  recovery. 

6.  If  both  Iq  and  It  are  valid  but  to  <  m  ,  use  Fi^  for  recovery. 

Before  the  end  of  the  main{)  function,  the  jepilogQ  subroutine  is  encountered  which  deletes  both 
index  files  and  any  remaining  checlqmint  files.  All  abort{)  and  exit{)  subroutine  calls  are  replaced  by 
Jbort{)  -and  -xtt()  respectively,  which  will  call  .epilog()  before  terminating  the  program.  The  _open() 
subroutine  will  call  fopen{}  after  recording  the  file  name  and  access  mode  of  the  file.  The  .closeQ 
subroutine  replaces  the  traditional  fcloseO  call  whose  purpose  is  to  delete  the  information  recorded  by 
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Table  VI:  CATCH  compiler  switches 


switch 

default 

descriptioB 

-catch-n 

off 
n  =  1 

enable  the  CATCH  filter,  maintain  at 
least  n  checkpoint  files  during  normal 
execution 

-auto-n 

off 

n  =60 

enable  automatic  insertion  of  sparse 
potential  checkpoint  code  with  T  =  n 
seconds 

-pl-n 

on 

n  =  1000 

set  reduction  factor  1  =  n 

•show 

off 

display  the  checkpointing  and  recovoy 
activities 

<oni|»ess 

off 

enable  LZW  data  conqnessor 

-adaptive*n 

off 
n  =  0 

set  adaptive  checkpointing  window  size 
w  =  n 

-train-n 

off 

n  =  c» 

set  training  mode 

set  maximum  training  time  to  n 

.open{ ).  A  modified  version  of  malloc{ )  exists  that  returns  to  the  operating  system  the  deallocated  memory 
blocks  contiguous  to  the  unallocated  area  through  6rA;()  or  sbrk{). 


5.4.  User  Control 

The  CATCH  GCC  compiler  does  permit  the  user  to  explicitly  add  chedqminting  code  to  the  program,  if 
desired.  A  potential  checlqioint  location  can  be  created  by  simply  adding  the  subroutine  .potentialQ  to 
the  code.  The  compiler  will  not  add  any  sparse  potential  checkpoint  code  in  a  block  where  a  jpotential{ ) 
subroutine  is  explicitly  added.  Similaily,  a  checkpoint  location  can  be  created  by  adding  tte  subroutine 
^checkpoints)  to  the  program.  Conversely,  if  the  speed  of  a  particular  block  is  critical  to  a  program,  the 
subroutine  -nocheckpoint{ )  may  be  added  so  that  the  compiler  will  not  add  any  sparse  potential  checkpoint 
code  in  the  block.  Incidentally,  no  object  code  is  generated  for  the  .nocheckpoint( )  subroutine.  The 
different  checkpointing  schemes  with  their  corresponding  parameters  can  be  selected  by  the  user  by  means 
of  compiler  switches.  The  switches,  their  desaiption,  and  their  default  settings  are  shown  in  Table  VI. 
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6.  Conclusion 


A  means  of  performing  checlq)ointing  with  the  aid  of  a  compiler  was  detailed  in  this  paper.  The  compiler- 
assisted  ^proach  is  transparent  to  both  the  programmer  and  the  hardware,  implication  level  checkpointing 
inserted  sparse  potential  checkpoint  code  into  the  user  program.  For  programs  exhibiting  large  variations 
in  memory  usage,  the  adaptive  checkpointing  technique  reduced  checkpoint  size  as  well  as  checkpoint 
time  by  allowing  a  window  of  opportunity  to  take  a  checkpoint.  The  training  method  selected  low-cost, 
high-coverage  potential  checlqmints  which  had  the  effect  of  reducing  the  overall  polling  time  overhead. 
Experimental  results  revealed  that  the  CATCH  compilo^  works  well  on  code  consisting  of  loops  with  many 
iterations  and  large  loop  body  size. 
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